Connecting To The Server To Fetch The WebPage Elements!!....
MXPlank.com MXMail Submit Research Thesis Electronics - MicroControllers Contact us QuantumDDX.com



Search The Site





 

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion


Deepfakes can pose real threats for society, especially if targeted at people in power. Attacking the problem, researchers create new deepfake detection methods aimed at detecting instances of image and video falsification


A recent paper published on arXiv.org looks for a method to detect video falsifications related to a person's identity. The goal of the researchers is to recognize whether the purported person "seen" in a video is themselves. The problem thus is broader and includes both deepfakes and falsified pristine videos.

A semantic, multimodal detection approach is proposed to solve the task. It integrates speech transcripts into person-specific gesture analysis. The approach relies on an intuition that everyone has unique patterns in how their speech, facial expressions, and gestures co-occur. A comparative study of several fake video detection methods across multiple fake types shows that the proposed approach shows strong generalization performance across all types of fakes.


A deepfake - artistic impression. Image credit: ApolitikNow via Flickr , CC BY-NC-SA 2.0

In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques. Such falsifications range from cheapfakes (e.g., lookalikes or audio dubbing) to deepfakes (e.g., sophisticated AI media synthesis methods), which are becoming perceptually indistinguishable from real videos. To tackle this challenge, we propose a multi-modal semantic forensic approach to discover clues that go beyond detecting discrepancies in visual quality, thereby handling both simpler cheapfakes and visually persuasive deepfakes. In this work, our goal is to verify that the purported person seen in the video is indeed themselves by detecting anomalous correspondences between their facial movements and the words they are saying. We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others. We use interpretable Action Units (AUs) to capture a persons' face and head movement as opposed to deep CNN visual features, and we are the first to use word-conditioned facial motion analysis. Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation. We further demonstrate our method's effectiveness on a range of fakes not seen in training including those without video manipulation, that were not addressed in prior work.